AITopics | extreme classification

Collaborating Authors

extreme classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e7e69cdf28f8ce6b69b4e1853ee21bab-Paper.pdf

Neural Information Processing SystemsFeb-15-2026, 02:35:29 GMT

accuracy, classification, graph, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Asia > Middle East > Israel (0.05)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.95)

Add feedback

CascadeXML: RethinkingTransformersfor End-to-endMulti-resolutionTraininginExtreme Multi-labelClassification

Neural Information Processing SystemsFeb-7-2026, 11:22:29 GMT

CascadeXML significantly outperforms allexisting approaches with non-trivial gains obtained on benchmark datasets consisting of up to three millionlabels.

artificial intelligence, classification, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Finland (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsDec-25-2025, 12:28:02 GMT

In the last decade, it has been shown that many hard AI tasks, especially in NLP, can be naturally modeled as extreme classification problems leading to improved precision. However, such models are prohibitively expensive to train due to the memory bottleneck in the last layer. For example, a reasonable softmax layer for the dataset of interest in this paper can easily reach well beyond 100 billion parameters (> 400 GB memory). To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic $K$-classification algorithm where memory provably scales at $O(\log K)$ without any assumption on the relation between classes. MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes to few embarrassingly parallel and independent classification tasks with a small (constant) number of classes.

amazon search, count-min sketch, extreme classification, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Loss-Based Decoding on Graphs for Extreme Classification

Neural Information Processing SystemsNov-20-2025, 23:11:29 GMT

In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space (LTLS), and on a general approach for error correcting output coding (ECOC) with loss-based decoding, and introduce a flexible and efficient approach accompanied by theoretical bounds. Our framework employs output codes induced by graphs, for which we show how to perform efficient loss-based decoding to potentially improve accuracy. In addition, our framework offers a tradeoff between accuracy, model size and prediction time. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows that our method is competitive with state-of-the-art algorithms.

efficient loss-based decoding, extreme classification, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

e7e69cdf28f8ce6b69b4e1853ee21bab-Paper.pdf

Neural Information Processing SystemsNov-20-2025, 20:57:39 GMT

accuracy, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Israel (0.05)
(3 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsJan-24-2025, 11:17:20 GMT

This paper studies the task of extreme classification with a large amount of target categories. It developed a hashing-based algorithm, MACH. Then a classifier is trained and applied for each hash mapping, on the reduced problem with much smaller amount of target classes. The prediction results of the sub-classifiers are then combined to re-constructed the final output. The proposed methods are demonstrated to be both efficient and effective in multiple datasets.

amazon search, count-min sketch, extreme classification, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsJan-24-2025, 11:17:09 GMT

The paper presents a method for scaling up classifiers for tasks with extremely large number of classes, with memory requirements scaling with O(logK) for K classes. The proposed model is uses count-min sketch to transform a very large classification problem to a small number of classification tasks with a fixed small number of classes. Each of these models can be trained independently and in parallel. Experimental results on a number of multi-class and multi-label classification tasks shows that it either performs as well as other more resource-demanding approaches or it outperforms them, The methodological contribution is significant and it would be the baseline of future studies.

amazon search, count-min sketch, extreme classification, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Navigating Extremes: Dynamic Sparsity in Large Output Space

Ullah, Nasib, Schultheis, Erik, Lasby, Mike, Ioannou, Yani, Babbar, Rohit

arXiv.org Artificial IntelligenceNov-6-2024

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates, the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especially at the largest label spaces. We find that poor gradient flow from the sparse classifier to the dense text encoder make it difficult to learn good input representations. By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability and practical benefits of DST in a challenging domain -- characterized by a highly skewed label distribution that differs substantially from typical DST benchmark datasets -- which enables end-to-end training with millions of labels on commodity hardware.

classification, dataset, international conference, (15 more...)

arXiv.org Artificial Intelligence

2411.03171

Country:

North America > United States (0.68)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
Europe > United Kingdom > England > Somerset > Bath (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsOct-10-2024, 05:33:45 GMT

In the last decade, it has been shown that many hard AI tasks, especially in NLP, can be naturally modeled as extreme classification problems leading to improved precision. However, such models are prohibitively expensive to train due to the memory bottleneck in the last layer. For example, a reasonable softmax layer for the dataset of interest in this paper can easily reach well beyond 100 billion parameters ( 400 GB memory). To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic K -classification algorithm where memory provably scales at O(\log K) without any assumption on the relation between classes. MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes to few embarrassingly parallel and independent classification tasks with a small (constant) number of classes.

amazon search, count-min sketch, extreme classification, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

extreme classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e7e69cdf28f8ce6b69b4e1853ee21bab-Paper.pdf

CascadeXML: RethinkingTransformersfor End-to-endMulti-resolutionTraininginExtreme Multi-labelClassification

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Efficient Loss-Based Decoding on Graphs for Extreme Classification

e7e69cdf28f8ce6b69b4e1853ee21bab-Paper.pdf

0e0157ce5ea15831072be4744cbd5334-Paper-Conference.pdf

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Navigating Extremes: Dynamic Sparsity in Large Output Space

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products